Lecture Lab 7

Nils Hofmann

Which questions you can answer after today’s lecture?

  • What does version control mean?

    • How does Git and GitHub relate to this and what is their difference?
  • Why do Git and teamwork go hand in hand in data science?

  • How to use Git in a team

Things that happen without version control

Version Control - What and why

Think about your own a experiences with coding and how a system could remove these issues. Also what should this system have if you work with others on one project

Collect your ideas at the course Padlet

Version Control - Definition

Version control is like a special “undo” button for your work. It remembers all the changes you make, so you can always go back to an older version if needed. It also lets you and your friends work on the same project without mixing up each other’s changes.

Git - An open source version control system

  • installed locally on your device

  • keeps track of your file changes

    • each change is indexed, so you can revert your current state
  • You interact with it via the terminal

GitHub

  • Hub = Server where you can find, store and manage repositories

    • everything is remote
  • GitHub is NOT a version control system but a platform which provides a GUI for your repository and enables users to share code easily

  • you interact with it via an application or a web browser

Git and GitHub together

  • GitHub allows you to have the project repository remotely

  • Multiple people who have git installed can then access this repository and create a copy

  • GitHub introduces visual features and eases the project management

Git - The basics

Git - simple workflow when working together

If you forget to pull…

  • if you and your teammate work on the same codeline and both push the results to the remote repository, git will detect a merge conflict

Git in RStudio - Buttons are Terminal commands

Git in RStudio - Commit in GUI

Git in RStudio - Version Control in GUI

GitHub - Overview

GitHub - Use the README!

  • README.md is a markdown document which you create to provide a short instruction or overview of your software

  • MUST be in the root directory

Git - working together with clone or fork

  • people involved in one project need to have a copy of the current project state in their local system

    • there are two ways of getting this: cloning and forking

Git - When should I use clone and fork now?

  • it is a question of:

    • How much control do I want?

    • How do I want to continue with the project as a collaborator?

  • If you fork, you will have complete control over the repository and will not directly influence the target repository with git commands

    • BUT your collaborators will not be able to directly apply and use your changes
  • If you plan to contribute to a target repository you typically clone it

Git - Branches

  • branch: new separate and isolated version of the mother branch

    • useful for: experimenting, bug fixing, adding features

    • see it as a structural component for organizing a project

  • git checkout -b : command for switching to another branch

Git - Merge

  • helps to combine changes from two branches into a single branch

  • current branch: The branch to be merged

  • target branch: The branch in which we want to merge the current branch

  • Why F in merged target branch?

    = new commit index

Git - Merge conflicts

  • happens mostly if you make changes in two branches in the same line of a file

    • git will not know what you want

Git/GitHub - Why you should care

  • Teamwork and Git belong together in Data Science/Computer Science

  • GitHub and reproducible data science go hand in hand

  • There will be the moment when git will safe either a lot of time or your a** in a big project

  • Nearly all data science jobs will require knowledge with version control systems

    • Also recruiters can see your projects and skills via GitHub